A Finite-State Morphological Analyser for Sindhi
نویسندگان
چکیده
Morphological analysis is a fundamental task in natural-language processing, which is used in other NLP applications such as part-of-speech tagging, syntactic parsing, information retrieval, machine translation, etc. In this paper, we present our work on the development of free/open-source finite-state morphological analyser for Sindhi. We have used Apertium’s lttoolbox as our finite-state toolkit to implement the transducer. The system is developed using a paradigm-based approach, wherein a paradigm defines all the word forms and their morphological features for a given stem (lemma). We have evaluated our system on the Sindhi Wikipedia, which is a freely-available large corpus of Sindhi and achieved a reasonable coverage of about 81% and a precision of over 97%.
منابع مشابه
Finite State Morphology and Sindhi Noun Inflections
Sindhi is a morphologically rich language. Morphological construction include inflections and derivations. Sindhi morphology becomes more complex due to primary and secondary word types which are further divided into simple, complex and compound words. Sindhi nouns are marked by number gender and case. Finite state transducers (FSTs) quite reasonably represent the inflectional morphology of Sin...
متن کاملDeveloping language technology tools and resources for a resource-poor language: Sindhi
Sindhi, an Indo-Aryan language with more than 75 million native speakers1 is a resourcepoor language in terms of the availability of language technology tools and resources. In this thesis, we discuss the approaches taken to develop resources and tools for a resourcepoor language with special focus on Sindhi. The major contributions of this work include raw and annotated datasets, a POS Tagger,...
متن کاملFast Morphological Analysis of Czech
This paper presents a new Czech morphological analyser which takes an advantage of Jan Daciuk’s algorithms for minimal deterministic acyclic finite state automata. The new analyser is six times faster than the current analyser ajka concerning the proper analysis, i.e. returning possible lemmata and tags for a given word form, but for some other related tasks is the difference even bigger.
متن کاملA Two-Level Morphological Analyser for the Indonesian Language
This paper presents our efforts at developing an Indonesian morphological analyser that provides a detailed analysis of the rich affixation process. We model Indonesian morphology using a two-level morphology approach, decomposing the process into a set of morphotactic and morphophonemic rules. These rules are modelled as a network of finite state transducers and implemented using xfst and lexc...
متن کاملA Morphological Analyser for Machine Translation Based on Finite-state Transducers
A finite-state, rule-based morphological analyser is presented here, within the framework of machine translation system TAVAL. This morphological analyser introduces specific features which are particularly useful for translation, such as the detection and morphological tagging of word groups that act as a single lexical unit for translation purposes. The case where words in one such group are ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016